Even though the USA is a first world country, 40 million Americans, including 12 million children, are “food insecure.” The U.S. Department of Agriculture (USDA), defines food insecurity as an “economic and social condition of limited or uncertain access to adequate food.” Exactly what this definition means is matter of metrics and their interpretation. Because food insecurity may be affecting different demographics differently, we should measure and understand those differences. Food insecurity can lead to negative consequences for individuals’ health and economic well being. What those consequences are and their severity can also be measured. If food insecurity is weighing down a significant part of our society, and if we choose to do something about it, we should try to understand the factors that may be related to or may be predictors of food insecurity. If we design and implement policies and programs to address food insecurity, we should be able to measure their impact on the problem.
The USDA collects and maintains data related to issues outlined above in a “Food Environment Atlas.” The fundamental measure of food insecurity comes from an annual household survey. The survey has questions to assess different experiences and behaviors that indicate food insecurity, such as being unable to afford balanced meals, cutting the size of meals because of too little money for food, or being hungry because of too little money for food. The food security status of a household is designated as secure, insecure, or very insecure based on the number of food-insecure conditions reported. This data is aggregated to the state level in the Atlas.
A hierarchy of factors are believed to be causes of food insecurity. These factors are availability (of food sources in a geographic region), access (to the food sources by the population), and utilization (of food sources by the population). The Atlas maintains three broad categories of data:
Food Choices—Indicators of the community's access to and acquisition of healthy, affordable food, such as: access and proximity to a grocery store; number of food stores and restaurants; expenditures on fast foods; food and nutrition assistance program participation; food prices; food taxes; and availability of local foods.
Health and Well-Being—Indicators of the community's success in maintaining healthy diets, such as: food insecurity; diabetes and obesity rates; and physical activity levels.
Community Characteristics—Indicators of community characteristics that might influence the food environment, such as: demographic composition; income and poverty; population loss; metro-nonmetro status; natural amenities; and recreation and fitness centers.
These add up to over 275 indicators of the food environment collected at the county or the state level. This data can be processed and analyzed to create maps showing variation of individual indicators across the country, create charts showing variation in a single indicator over time, and correlation between individual factors. Additional data related to pilot or new programs can potentially be combined with the Atlas data to judge the effectiveness of those programs.
The purpose of this project is to examine the factors affecting food insecurity on a county and state wide level throughout the US. We will get our data from the Food Environment Atlas (located at https://www.ers.usda.gov/data-products/food-environment-atlas.aspx) and use data processing techniques to find the relationships between several variables, including food access, food availability, assistance, prices, local foods, demographics, insecurity, and health.
import numpy as np
import datetime as dt
import pandas as pd
from math import sqrt
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets, linear_model
import statistics as stat
import xlrd
import json
import vincent
from IPython.display import Image
# vincent.core.initialize_notebook()
# world_topo = r'world-countries.topo.json'
# state_topo = r'us_states.topo.json'
# lake_topo = r'lakes_50m.topo.json'
# county_geo = r'us_counties.geo.json'
# county_topo = r'us_counties.topo.json'
# or_topo = r'or_counties.topo.json'
xl = pd.ExcelFile("DataDownload.xls")
xl.sheet_names
df_data = xl.parse("Supplemental Data - County")
df_acc = xl.parse("ACCESS")
df_acc = df_acc.drop(['State', 'County'], axis = 1)
df_sto = xl.parse("STORES")
df_sto = df_sto.drop(['State', 'County'], axis = 1)
df_res = xl.parse("RESTAURANTS")
df_res = df_res.drop(['State', 'County'], axis = 1)
df_ass = xl.parse("ASSISTANCE")
df_ass = df_ass.drop(['State', 'County'], axis = 1)
df_ins = xl.parse("INSECURITY")
df_ins = df_ins.drop(['State', 'County'], axis = 1)
df_pri = xl.parse("PRICES_TAXES")
df_pri = df_pri.drop(['State', 'County'], axis = 1)
df_loc = xl.parse("LOCAL")
df_loc = df_loc.drop(['State', 'County'], axis = 1)
df_hea = xl.parse("HEALTH")
df_hea = df_hea.drop(['State', 'County'], axis = 1)
df_soc = xl.parse("SOCIOECONOMIC")
df_soc = df_soc.drop(['State', 'County'], axis = 1)
print("""
To be able to map this crime data, we needed to get the coordinates correspoding to the addresses.
We decided to use the GoogleMaps API in order to do this, but realized that when you query the API it
only takes in addresses in certain formats.
Thus, the building names that were included in some of the addresses needed to be taken
out and stored in a different column so that we could have a column with only the addresses.
""")
df = df_data.copy()
df = df.rename(columns = {'FIPS ': 'FIPS', 'State': 'State_temp', 'County': 'County_temp'})
df = df.merge(df_acc, how = 'inner', on = 'FIPS')
df = df.merge(df_sto, how = 'inner', on = 'FIPS')
df = df.merge(df_res, how = 'inner', on = 'FIPS')
df = df.merge(df_ass, how = 'inner', on = 'FIPS')
df = df.merge(df_ins, how = 'inner', on = 'FIPS')
df = df.merge(df_pri, how = 'inner', on = 'FIPS')
df = df.merge(df_loc, how = 'inner', on = 'FIPS')
df = df.merge(df_hea, how = 'inner', on = 'FIPS')
df = df.merge(df_soc, how = 'inner', on = 'FIPS')
df = df.rename(columns = {'State_temp': 'State', 'County_temp': 'County'})
temp = list(df['FIPS'])
for i in range(len(temp)):
temp[i] = str(temp[i])
if len(temp[i]) < 5:
temp[i] = '0' + temp[i]
df['FIPS'] = temp
temp = list(df['2010 Census Population'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['2010 Census Population'] = temp
temp = list(df['Population Estimate, 2011'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2011'] = temp
temp = list(df['Population Estimate, 2012'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2012'] = temp
temp = list(df['Population Estimate, 2013'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2013'] = temp
temp = list(df['Population Estimate, 2014'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2014'] = temp
temp = list(df['Population Estimate, 2015'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2015'] = temp
temp = list(df['Population Estimate, 2016'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2016'] = temp
print("""
To be able to map this food data, we needed to get the coordinates correspoding to each of the counties represented.
We decided to use the Vincent API in order to do this, but realized that when you query the API it only takes in addresses in certain formats. Thus, the building names that were included in some of the addresses needed to be taken out and stored in a different column so that we could have a column with only the addresses.
We created a BUILDING column to store the building names separately. We used the fact that addresses with building names were stored in the fromat building_name at address to our advantage. We looked at each address and if it had the string " at " inside it using pandas .find() function, we put everything before the " at " into the building column and everything after remained in the LOCATION column. Addresses without a buidling were left untouched and the value in the building column was set to NaN. Remember, this is all part of tidying data.
""")
df
list(df.columns)
m = list(df.columns)
for i in range(len(m)):
m[i] = str(m[i])
df = df[m]
print("""
As seen in the DataFrame above, the location that corresponds to a hunger appears below the hunger index in a separate row.
Thus, we created a location column and set the value of the column equal to the location that each hunged index value.
We then dropped all the rows that contain NaN (which were the rows with the location data that is now in the new column) and re-indexed the DataFrame.
""")
df
#Creating the standardized dataframe since results from the excel file vary by county population
norm_df = df.copy()
norm_factors = {}
for st in norm_df.columns:
if st == 'FIPS' or st == 'State' or st == 'County':
continue
temp = list(norm_df[st])
total = len(temp)
mn = 0.0
for i in range(0, len(temp)):
if np.isnan(temp[i]):
total -= 1
else:
mn += temp[i]
mn /= total
st_dev = 0.0
for i in range(len(temp)):
if not np.isnan(temp[i]):
st_dev += ((mn - temp[i]) ** 2)
st_dev /= total
st_dev = sqrt(st_dev)
if(st_dev < 0.000001):
st_dev = 1.0
norm_factors[st] = [mn, st_dev]
for i in range(len(temp)):
if not np.isnan(temp[i]):
temp[i] = (temp[i] - mn) / st_dev
norm_df[st] = temp
#Standardized all inputs by setting mean to 0 and st.dev to 1 for each column
norm_df
We also created a data dictionary that stored addresses as keys and the corresponding latitudes and longitudes in an array of size 2 as the values. We did this because we noticed that there were redundancies in the locations and realized that we could avoid pinging the API every time we needed to find coordinates (for efficiency). Thus, each time we were filling in the coordinates we first checked if the address already existed in the data dictionary. If the address did exist, we used the coordinates we already had. If it did not exist, we pinged the API and created a new entry in the data dictionary.
We also found that parking lots needed to be queried in a certain format. For example if you needed the address to lot A you needed to query with Lot A. As a result, when addresses contained the word "lot" we reformatted them. We also found that certain lots did not have exact coordinates associated with them or were stored in Google Maps under a slightly different name. We found the coordinates for these lots ourselves and hard coded them for accuracy.
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='MEDHHINC15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 20000, 30000, 40000, 50000, 70000, 100000, 150000]
# vis.legend(title='Median Household Income 2015')
# vis.to_json('vega.json')
# vis.display()
Image(filename='MEDHHINC15.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='POVRATE15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 2, 5, 10, 15, 20, 30, 50]
# vis.legend(title='Poverty Rate 2015')
# vis.to_json('vega.json')
# vis.display()
Image(filename='POVRATE15.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='PCT_LACCESS_POP15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 5, 10, 20, 30, 50, 75, 100]
# vis.legend(title='Percentage with Low Access 2015')
# vis.to_json('vega.json')
# vis.display()
Image(filename='PCT_LACCESS_POP15.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='GROCPTH14', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 0.1, 0.25, 0.5, 1, 2, 3, 4]
# vis.legend(title='Grocery Stores per 1000 pop. 2015')
# vis.to_json('vega.json')
# vis.display()
Image(filename='GROCPTH14.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='PCT_SNAP16', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 5, 7.5, 10, 12.5, 15, 20, 25]
# vis.legend(title='Percentage SNAP participation 2016')
# vis.to_json('vega.json')
# vis.display()
Image(filename='PCT_SNAP16.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='PCT_WIC15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 1, 1.25, 1.5, 1.75, 2, 2.5, 3]
# vis.legend(title='Percentage WIC participation 2016')
# vis.to_json('vega.json')
# vis.display()
Image(filename='PCT_WIC15.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='FMRKTPTH16', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.5]
# vis.legend(title='Farmers Markets per 1000 pop. 2016')
# vis.to_json('vega.json')
# vis.display()
Image(filename='FMRKTPTH16.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='FOODINSEC_13_15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 7.5, 10, 12.5, 15, 17.5, 20, 25]
# vis.legend(title='Percentage Food Insecurity 2013-2015')
# vis.to_json('vega.json')
# vis.display()
Image(filename='FOODINSEC_13_15.png')
var_list = ['MEDHHINC15', 'POVRATE15', 'PCT_LACCESS_POP15', 'GROCPTH14', 'PCT_SNAP16', 'PCT_WIC15', 'FMRKTPTH16', 'FOODINSEC_13_15']
col_list = list(df.columns)
for name in col_list:
if name not in var_list:
del df[name]
for name in col_list:
if name not in var_list:
del norm_df[name]
for name in var_list:
df = df[np.isfinite(df[name])]
for name in var_list:
norm_df = norm_df[np.isfinite(norm_df[name])]
df = df[var_list]
norm_df = norm_df[var_list]
corr_mat = norm_df.corr()
sns.heatmap(corr_mat, cmap = 'RdYlGn', annot = True)
def transpose(l):
new_l = []
for i in range(len(l)):
new_l += [[l[i]]]
return new_l
sns.regplot(x='MEDHHINC15', y='PCT_LACCESS_POP15', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['PCT_LACCESS_POP15'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='GROCPTH14', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['GROCPTH14'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='PCT_SNAP16', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['PCT_SNAP16'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='PCT_WIC15', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['PCT_WIC15'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='FMRKTPTH16', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['FMRKTPTH16'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='FOODINSEC_13_15', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['FOODINSEC_13_15'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))